Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            The demand for Deep Neural Network (DNN) execution (including both inference and training) on mobile system-ona-chip (SoCs) has surged, driven by factors like the need for real-time latency, privacy, and reducing vendors’ costs. Mainstream mobile GPUs (eg, Qualcomm Adreno GPUs) usually have a 2.5 D L1 texture cache that offers throughput superior to that of on-chip memory. However, to date, there is limited understanding of the performance features of such a 2.5 D cache, which limits the optimization potential. This paper introduces TMModel, a framework with three components: 1) a set of micro-benchmarks and a novel performance assessment methodology to characterize a non-well-documented architecture with 2D memory, 2) a complete analytical performance model configurable for different data access pattern (s), tiling size (s), and other GPU execution parameters for a given operator (and associated size and shape), and 3) a compilation framework incorporating this model and generating optimized code with low overhead. TMModel is validated both on a set of DNN kernels and for training complete models on mobile GPU.more » « lessFree, publicly-accessible full text available June 9, 2026
- 
            The demand for Deep Neural Network (DNN) execution (including both inference and training) on mobile system-ona-chip (SoCs) has surged, driven by factors like the need for real-time latency, privacy, and reducing vendors’ costs. Mainstream mobile GPUs (eg, Qualcomm Adreno GPUs) usually have a 2.5 D L1 texture cache that offers throughput superior to that of on-chip memory. However, to date, there is limited understanding of the performance features of such a 2.5 D cache, which limits the optimization potential. This paper introduces TMModel, a framework with three components: 1) a set of micro-benchmarks and a novel performance assessment methodology to characterize a non-well-documented architecture with 2D memory, 2) a complete analytical performance model configurable for different data access pattern (s), tiling size (s), and other GPU execution parameters for a given operator (and associated size and shape), and 3) a compilation framework incorporating this model and generating optimized code with low overhead. TMModel is validated both on a set of DNN kernels and for training complete models on mobile GPU.more » « lessFree, publicly-accessible full text available June 9, 2026
- 
            Team-PSRO for Learning Approximate TMECor in Large Team Games via Cooperative Reinforcement LearningRecent algorithms have achieved superhuman performance at a number of twoplayer zero-sum games such as poker and go. However, many real-world situations are multi-player games. Zero-sum two-team games, such as bridge and football, involve two teams where each member of the team shares the same reward with every other member of that team, and each team has the negative of the reward of the other team. A popular solution concept in this setting, called TMECor, assumes that teams can jointly correlate their strategies before play, but are not able to communicate during play. This setting is harder than two-player zerosum games because each player on a team has different information and must use their public actions to signal to other members of the team. Prior works either have game-theoretic guarantees but only work in very small games, or are able to scale to large games but do not have game-theoretic guarantees. In this paper we introduce two algorithms: Team-PSRO, an extension of PSRO from twoplayer games to team games, and Team-PSRO Mix-and-Match which improves upon Team PSRO by better using population policies. In Team-PSRO, in every iteration both teams learn a joint best response to the opponent’s meta-strategy via reinforcement learning. As the reinforcement learning joint best response approaches the optimal best response, Team-PSRO is guaranteed to converge to a TMECor. In experiments on Kuhn poker and Liar’s Dice, we show that a tabular version of Team-PSRO converges to TMECor, and a version of Team PSRO using deep cooperative reinforcement learning beats self-play reinforcement learning in the large game of Google Research Football.more » « less
- 
            A reversible diffusion process is initialized at position x0 and run until it first hits any of several targets. What is the probability that it terminates at a particular target? We propose a computationally efficient approach for estimating this probability, focused on those situations in which it takes a long time to hit any target. In these cases, direct simulation of the hitting probabilities becomes prohibitively expensive. On the other hand, if the timescales are sufficiently long, then the system will essentially “forget” its initial condition before it encounters a target. In these cases the hitting probabilities can be accurately approximated using only local simulations around each target, obviating the need for direct simulations. In empirical tests, we find that these local estimates can be computed in the same time it would take to compute a single direct simulation, but that they achieve an accuracy that would require thousands of direct simulation runs.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                     Full Text Available
                                                Full Text Available